Input-output example encoding

ABSTRACT

Generally discussed herein are devices, systems, and methods for encoding input-output examples. A method of generating a program using an encoding of input-output examples, may include processing an input example of the input-output examples, using a first long short term memory (LSTM) neural network, one character at a time to produce an input feature vector, processing an output example associated with the input example in the input-output examples, using the LSTM neural network, one character at a time to produce an output feature vector, determining (a) a cross-correlation between the input feature vector and the output feature vector or (b) previously computed feature vectors for a different input-output example that are sufficiently close to the input feature vector and the output feature vector, respectively, and using the determined cross-correlation or previously computed vector, generating a program consistent with the input example and the output example.

BACKGROUND

A number of neural network architectures have been proposed for programinduction. Given a set of input-output examples, these architectures maybe able to learn mappings that generalize new test inputs, such that adesired output may be predicted based on the new test input(s). Thesearchitectures have some limitations such as being computationallyexpensive, being hard to train, a model may need to be trained for eachtask separately, and/or it may be difficult to interpret or verify thecorrectness of a learned mapping.

The desire for better interpretability and scalability of neural networkmodels has motivated research into program synthesis, that is, theautomatic construction of interpretable programs in a givendomain-specific language (DSL) that are consistent with a givenspecification (taking the form of, e.g., partial programs, input-outputexamples, or natural language). Various approaches have been developedto search over the space of possible programs in the DSL; theseapproaches include, for example, stochastic, constraint-based, andversion-space-algebra-based algorithms. Many of these techniques notonly take significant engineering and research efforts to developcarefully-designed heuristics for efficient search, but are limited intheir range of applicability and the sizes and types of programs theycan synthesize.

SUMMARY

This summary section is provided to introduce aspects of embodiments ina simplified form, with further explanation of the embodiments followingin the detailed description. This summary section is not intended toidentify essential or required features of the claimed subject matter,and the particular combination and order of elements listed this summarysection is not intended to provide limitation to the elements of theclaimed subject matter.

A method of generating a program using an encoding of input-outputexamples includes processing an input example of the input-outputexamples, using a first long short term memory (LSTM) neural network,one character at a time to produce an input feature vector, processingan output example associated with the input example in the input-outputexamples, using a second LSTM neural network, one character at a time toproduce an output feature vector, determining (a) a cross-correlationbetween the input feature vector and the output feature vector or (b) apreviously computed vector for a different input-output example thatincludes feature vectors less than a threshold distance from the inputfeature vector and the output feature vector, respectively, and usingthe determined cross-correlation or previously computed vector,generating a program consistent with the input example and outputexample.

A non-transitory machine-readable medium including instructions forexecution by a processor of the machine to perform operations includingprocessing an input example of input-output examples, using a first longshort term memory (LSTM) neural network, one character at a time toproduce an input feature vector, processing an output example associatedwith the input example in the input-output examples, using a second LSTMneural network, one character at a time to produce an output featurevector, determining (a) a cross-correlation between the input featurevector and the output feature vector or (b) a previously computed vectorfor a different input-output example that includes feature vectors lessthan a threshold distance from the input feature vector and the outputfeature vector, respectively, and using the determined cross-correlationor previously computed vector, generating a program consistent with theinput example and output example.

A device includes a processor and a memory device coupled to theprocessor and having a program stored thereon for execution by theprocessor to perform operations. The operations include processing aninput example of input-output examples, using a first long short termmemory (LSTM) neural network, one character at a time to produce aninput feature vector, processing an output example associated with theinput example in the input-output examples, using a second LSTM neuralnetwork, one character at a time to produce an output feature vector,determining (a) a cross-correlation between the input feature vector andthe output feature vector or (b) a previously computed vector for adifferent input-output example that includes feature vectors less than athreshold distance from the input feature vector and the output featurevector, respectively, and using the determined cross-correlation orpreviously computed vector, generating a program consistent with theinput example and output example.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates, by way of example, a flow diagram of an embodimentof a technique of generating a program based on input-output examples.

FIG. 2 illustrates, by way of example, a block diagram of a portion ofan encoding system.

FIG. 3 illustrates, by way of example, a flow diagram of an embodimentof a system for determining a cross-correlation of input-outputexamples.

FIG. 4 illustrates, by way of example, a block diagram of an embodimentof a method for encoding input-output examples, such as the inputexamples and output examples of FIG. 1.

FIG. 5A illustrates, by way of example, a block diagram of an embodimentof a workflow for training neural networks for program synthesis.

FIG. 5B illustrates, by way of example, a block diagram of an embodimentof a workflow for using trained neural networks to synthesize a programin a given DSL based on input-output examples.

FIG. 6A illustrates, by way of example, a block diagram of an embodimentof a recursive pass through an example partial program tree, as may beused in a determination of expansion probabilities.

FIG. 6B illustrates, by way of example, a block diagram of an embodimentof a reverse-recursive pass through the example partial program tree ofFIG. 6A, as may be used in the computation of expansion probabilities.

FIG. 7 illustrates, by way of example, a block diagram of an embodimentof a computer system, as may be used for performing methods.

DETAILED DESCRIPTION

In the following description, reference is made to the accompanyingdrawings that form a part hereof, and in which is shown by way ofillustration specific embodiments which may be practiced. Theseembodiments are described in sufficient detail to enable those skilledin the art to practice the embodiments. It is to be understood thatother embodiments may be utilized and that structural, logical, and/orelectrical changes may be made without departing from the scope of theembodiments. The following description of embodiments is, therefore, notto be taken in a limited sense, and the scope of the embodiments isdefined by the appended claims.

The operations, functions, or algorithms described herein may beimplemented in software in some embodiments. The software may includecomputer executable instructions stored on computer or other machinereadable media or storage device, such as one or more non-transitorymemories or other type of hardware based storage devices, either localor networked. Further, such functions may correspond to subsystems,which may be software, hardware, firmware or a combination thereof.Multiple functions may be performed in one or more subsystems asdesired, and the embodiments described are merely examples. The softwaremay be executed on a digital signal processor, ASIC, microprocessor,central processing unit (CPU), graphics processing unit (GPU), fieldprogrammable gate array (FPGA), or other type of processor operating ona computer system, such as a personal computer, server or other computersystem, turning such computer system into a specifically programmedmachine. The functions or algorithms may be implemented using processingcircuitry, such as may include electric and/or electronic components(e.g., one or more transistors, resistors, capacitors, inductors,amplifiers, modulators, demodulators, antennas, radios, regulators,diodes, oscillators, multiplexers, logic gates, buffers, caches,memories, or the like).

Discussed herein are embodiments that may include automaticallyconstructing computer programs (e.g., compileable and/or executableprograms), detecting anomalies in input-output examples, and/orclassification of input-output examples. In one or more embodiments,after one or more neural networks are properly trained, a computerprogram constrained by a domain-specific language (DSL) may produceoutput consistent with the input-output examples.

In one or more embodiments, input-output examples may be encoded. Then,the encoded input-output examples may be analyzed to determine if aninput-output example does not belong in the set. Such an analysis mayinclude determining a distance between encoded input-output examples,such as between an individual encoded input-output example and all otherencoded input-output examples in a set of encoded input-output examples.The determined distances for the individual input-output example may besummed, averaged, or the like to determine a total distance. The totaldistance may be compared to a threshold. The input-output example may bedetermined to be anomalous if the total distance is greater than (orequal to) the threshold.

Like input-output examples may be classified using a similar distance tothreshold comparison. If the total distance is less than (or equal to) athreshold, the input-output example may be determined to be a part ofthe corresponding set of input-output examples.

Embodiments may include implementations of one or more of multipleneural networks. A first neural network, referred to sometimes as across-correlation input-output network may produce a representation of aset of input-output examples, given the set of input-output examples.Another neural network, sometimes referred to as arecursive-reverse-recursive neural network (R3NN), may produce aprogram, given the representation of the input-output examples. Theprogram may be generated by incrementally expanding partial programs.The effectiveness of this encoding and program production approach maybe tested by applying it to regular expression based stringtransformations. The results of the testing seem to support that R3NN isable to construct a program from new input-output examples. The resultsof the testing also seem to support that the R3NN is able to constructnew programs for tasks that it had never observed during training.

While the discussion that follows focuses on program generation, otherapplications, such as input-output example anomaly detection and/orinput-output example classification, as previously discussed, may bepossible based on the encoding.

The act of programming. (e.g., developing a procedure to accomplish atask) is a demonstration of the reasoning abilities of the human mind.Program induction is considered one of the fundamental problems inmachine learning and/or artificial intelligence. Recent progress on deeplearning has led to the proposal of a number of promising neural networkarchitectures for program induction. Many of these models are inspiredby computation subsystems (CPU, random access memory (RAM), GPU) orcommon data structures used in some techniques (e.g., a memory stack). Acommon thread in program induction is to specify the atomic operationsof the network in some differentiable form, allowing efficientend-to-end training of a neural controller, and/or to use reinforcementlearning to make choices about which operation to perform. While theseresults are impressive, these approaches have some limitations. Thelimitations may include one or more of: (a) they are computationallyexpensive and hard to train, (b) a model has to be trained for each task(program) separately, and (c) it is hard to interpret or verify thecorrectness of the learned mapping (as it is defined by a neuralnetwork). While some recently proposed methods are able to learninterpretable programs, such methods still need to learn a separateneural network model for each individual task.

At least partially motivated by the need for model interpretability andscalability to multiple tasks, embodiments discussed herein may addressa problem of program synthesis. Program synthesis, the problem ofautomatically constructing programs that are consistent with a givenspecification, has long been a subject of research in computer science.This interest has been reinvigorated in recent years behind developmentof methods for learning programs in various domains ranging fromlow-level bit manipulation code to data structure manipulations andregular expression based string transformations.

Some of the recently proposed approaches for program synthesis operateby searching the space of programs in a DSL instead of arbitraryTuring-complete languages. This hypothesis space of possible programs ishuge (potentially infinite) and searching over it is a challengingproblem. Several search approaches including enumerative, stochastic,constraint-based, and version-space algebra based algorithms have beendeveloped to search over the space of programs in the DSL, which supportdifferent kinds of specifications (examples, partial programs, naturallanguage, or the like) and domains. These approaches not only requiresignificant engineering and research effort to developcarefully-designed heuristics for efficient search, but also havelimited applicability and can only synthesize programs of limited sizesand types.

Embodiments herein include techniques, sometimes called neuro-symbolicprogram synthesis (NSPS) that learns and/or is trained to generate aprogram incrementally without the need for an explicit search. Oncetrained, NSPS may (e.g., automatically) construct a computer programthat is consistent with a set (e.g., any set) of input-output examplesprovided at test, run, and/or training time. Embodiments may include twoneural architectures. The first neural architecture, sometimes calledthe cross correlation input/output (I/O) network, produces an encodedrepresentation of a given set of input-output examples. The secondneural architecture, the (R3NN), given the encoded representation of theinput-output examples, synthesizes a program (e.g., an executable orcompilable program) by incrementally expanding partial programs. R3NN,in one or more embodiments, employs a tree-based neural architecturethat sequentially constructs a parse tree by selecting whichnon-terminal symbol to expand using rules from a context-free grammar(e.g., the DSL). This generative process over trees may be conditionedon an encoding of input-output example pairs that provide thespecification of the program for which the neural network is searching.A goal may include, provided some input values, the program found by themodel reproduces the provided output values when run based on the inputvalues.

The efficacy of one or more embodiments, as previously discussed, may betested by applying one or more approaches to the rich and complex domainof regular expression-based syntactic string transformations. The DSLused may be based on the one used by Flash-Fill, aProgramming-By-Example (PBE) system in Microsoft® Excel, from MicrosoftCorporation of Redmond, Wash., United States. Given multipleinput-output examples of strings, the task is to synthesize a programbuilt on regular expressions to perform the desired stringtransformation indicated by the given input-output examples.

FIG. 1 illustrates, by way of example, a flow diagram of an embodimentof a technique 100 of generating a program based on input-outputexamples. An example task that can be expressed in a DSL is shown inFIG. 1, which also shows the DSL. The technique 100 as illustratedincludes input examples 110, output examples 120, a DSL 130, and aprogram 140 generated based on the input examples 110 and outputexamples 120. Each of the output examples 120 includes a correspondinginput example 110. For example, “Charles, W.” is the desired output inresponse to input of “William Charles Henry”.

An evaluation methodology previously discussed seems to indicate thatembodiments of the NSPS discussed herein are able to construct programsfor known tasks from new input-output examples and to to constructcompletely new programs that it had not observed during training. Somefeatures of the embodiments discussed herein may include: a novel NSPStechnique to encode neural search over a space of programs defined usinga DSL, an R3NN model that encodes and expands partial programs in theDSL, where each node may include a global representation of the programtree, a novel cross-correlation based neural architecture for learning arepresentation (e.g., a continuous representation) of sets ofinput-output examples, and/or evaluation of the NSPS approach on thecomplex domain of regular expression based string transformations.

First, an overview of an approach that includes a formal definition ofthe DSL-based program synthesis problem that may be solved by one ormore embodiments. Given a DSL L, automatically construct a synthesisalgorithm A, such that, given a set of input-output examples, {(i1, o1),. . . , (in, on)}, A returns a program P∈L that conforms to theinput-output examples, as in Equation 1:

1≤j≤n P(ij)=oj  Equation 1

An example of syntax and semantics of a DSL for string transformation isshown in FIG. 1 (the program 140). The DSL 130 corresponds to a largesubset of a FlashFill DSL (except conditionals), and allows for a richerclass of substring operations than FlashFill. A DSL program takes asinput a string, v, and returns an output string, o. The top-level stringexpression e is a concatenation of a finite list of substringexpressions f1, . . . fn. A substring expression f can either be aconstant string s or a substring expression, which is defined using twoposition logics pl (left) and pr (right). A position logic correspondsto a symbolic expression that evaluates to an index in the string. Aposition logic p can either be a constant position k or a token matchexpression (r k, Dir), which denotes the start or end of the kth matchof token r in input string v. A regex token can either be a constantstring s or one of 8 regular expression tokens: p (ProperCase), C(CAPS), l (lowercase), d (Digits), a (Alphabets), an (Alphanumeric), ∧(StartOfString), and $ (EndOfString). This is but one example of astring expression DSL. DSLs for other types of programs, such as numericmanipulation (e.g., to perform mathematical operations) and/or string orother symbol manipulation may be used in place of the DSL.

A DSL program for the name transformation task shown in FIG. 1 that isconsistent with the examples 110 and 120 is provided as the program 140.The program 140 concatenates the following 4 strings: i) substringbetween the end of last whitespace and end of string, ii) constantstring “,”, iii) first character of input string, and iv) constantstring “.”.

A DSL can be considered a context-free grammar with terminal andnon-terminal symbols S and production rules R that allow representingprograms and partial programs as tree structures (see, e.g., FIGS. 3Aand 3B for example partial program trees). A partial program tree hastwo types of nodes (apart from the root node): leaf nodes and innernon-leaf nodes. A leaf node represents a symbol in the DSL, whethernon-terminal or terminal. An inner non-leaf node represents a particularproduction rule of the DSL, and the number of children of the non-leafnode is equivalent to the arity of the right hand side of the rule itrepresents. A partial program tree can be iteratively expanded byapplying production rules (e→e op2 e) to the non-terminal leaf nodes.Accordingly, a partial program tree represents a program obtained aftera number of steps into construction. Program construction is completeonce all leaves of the tree represent terminal symbols (such that thetree cannot be further expanded with production rules); such a completetree is herein referred to simply as the “program tree,” and representsa completed program under the DSL that is ready for execution.

A naive way to perform a search over the programs in a given DSL is tobegin with the start symbol of the DSL as root node, and theniteratively expand the partial tree by randomly choosing non-terminalleaf nodes (also simply “non-terminals”) to expand with randomly chosenproduction rules until a derivation with only terminal leaf nodes (alsosimply “terminals”), corresponding to a complete program tree, isreached. In accordance with various embodiments, by contrast, theprogram space is searched more efficiently with a generative model(herein also “program-generation model”) that assigns probabilities todifferent non-terminals in a partial derivation and correspondingexpansions to guide the search for complete derivations. The generativemodel is implemented with a neural network, and is conditioned oninput-output examples encoded themselves by a neural network. Thegenerative model and the input-output encoder, which collectivelyconstitute the synthesis algorithm A, may be trained end-to-end on atraining set of programs in the DSL together with their correspondinginput-output examples.

Encoding input-output examples is presented and followed by a discussionof program generation using the input-output encoding.

The encoded input-output examples may provide an at least partialspecification for the output program. An encoding of the input-outputexamples may aid the success of the program generator (discussedelsewhere). The encoding may be domain-specific, since different DSLshave different inputs (some may operate over integers, real numbers,strings, and/or a combination thereof). An encoding may be adapted tothe input-output symbol strings of the example symbol strings, such asshown in FIG. 1. Different ways of conditioning program search on thelearned input-output example encodings are also provided.

In the example of a string manipulation program (e.g., that shown inFIG. 1), there are at least two types of information that may beextracted from input-output examples: 1) constant strings, (e.g.,“@domain.com”, “.”, or any other constant string) which appear in alloutput examples; and 2) substring indices in input where the index mightbe further defined by a regular expression. These indices determinewhich parts of the input are also present in the output. In earlier“hand-engineered” systems, such as FlashFill, this information wasextracted from the input-output examples by performing a longest commonsubstring technique, a dynamic programming technique that finds matchingsubstrings in string pairs. To extract constant strings, FlashFill runsLCS on every output string in the I/O set to get a set of constantstring candidates. Then FlashFill takes the intersection of the constantstring candidates produced by every output string pair, giving the setof constant strings that are consistent for the entire I/O set. Asimilar procedure is done for extracting substring indices, except thatLCS is run on input-output string pairs rather than just output strings.A difficulty with this approach may include finding substring indiceswhere those indices are specified by regular expressions (regex), sinceLCS only operates over characters and not regex tokens. Therefore,FlashFill simply tries every possible regex that can be used atsubstring boundaries and exhaustively searches for one which is the most“general”, where generality is generally specified by hand-engineeredheuristics.

An encoding may extract a sort of generalized LCS that operates not onlyover the specific characters of the input string but also regularexpression tokens that match parts of the input string. Instead ofhand-designing a complicated technique to do this, a neural networkbased architecture may be trained to extract and produce representationsof the likely regular expressions given input-output examples.

FIG. 2 illustrates, by way of example, a block diagram of a portion ofan encoding system 200. The system 200 as illustrated includes inputexample 202, output examples 204, neural networks 206A and 206B, inputrepresentation 208, output representation 210, processing circuitry212A, 212B, 212C, and 212D, input feature vector 214, output featurevector 216, complete input feature vector 220, and complete outputfeature vector 222.

A first level of input-example encoding may include encoding using theneural network 206A-B. In one or more embodiments, the neural networks206A-B may include long-short term memory (LSTM) neural networks. Theinput to the neural network 206A is the input examples 202. The input tothe neural network 206B is the output examples 204. The input examples202 may include the input 110, and the output examples 204 may includethe output 120. The system 200 may run two, separate deep bidirectionallong-short term memory (LSTM) neural networks (e.g., 206A-B).

A topmost hidden representation at every time step (e.g., the inputrepresentation 208 and the output representation 210) may beconcatenated (or otherwise processed, such as by the processingcircuitry 212A-B) to produce a feature vector (e.g., the input featurevector 214 and the output feature vector 216). In the example ofconcatenating, the input feature vector 214 and the output featurevector 216 may be 4HT-dimensional per I/O pair, where T is the maximumstring length for any input or output, and H is the topmost neuralnetwork hidden dimension.

Each of the input feature vectors 214 and output feature vectors 216 maybe concatenated (or otherwise processed, such as by the processingcircuitry 218A-B), respectively, to produce the complete input featurevector 220 and the complete output feature vector 222, respectively. Thecomplete input feature vector 220 and complete output feature vector 222may provide an encoded vector representation of each of the inputexamples 110 and the output examples 120 in the input-output exampleset. This encoded representation may have little or no knowledge aboutwhat operations are being performed over the input examples 110 toproduce the output examples 120 (e.g., substring, constant, mathematicaloperation, regular expression, etc.), which might make it difficult todiscover substring indices, especially the ones based on regularexpressions.

Each of the input examples 202 may be processed one character at a timeby the neural network 206A. Each of the input examples 202 may include afirst character and a last character. For examples, the input example“Barack Rogers” includes a first character “B” and a last character “s”.The case of the character may or may not be important, depending on theoutput desired. In the example provided in FIG. 1, case is important.The neural network 206A may process the input example 202 one characterat a time in a “forward pass” and/or one character at a time in a“backward pass”. In the forward pass, the neural network 206A processesthe input example 202 one character at a time, from the first characterto the last character. In the backward pass, the neural network 206Bprocesses the input example 202 one character at a time, from the lastcharacter to the first character. In the example of the input example202 being “Barack Rogers”, the neural network 206A processes the “B”,“a”, “r”, “a”, “c” . . . “e”, “r”, and “s” in that order to produce aforward input feature vector and in the backward pass, the neuralnetwork 206A processes the “s”, “r”, “e” “g”, “o” . . . “r”, “a”, and“B” in that order to produce a backward input feature vector. The inputfeature vector 214 may include the forward input feature vector, thebackward input feature vector, and/or a combination (e.g., concatenationor other combination, such as an addition or average) of the forwardinput feature vector and the backward input feature vector.

Each of the output examples 204 may be processed, in a manner similar tothe input examples 202, one character at a time by the neural network206B. Each of the output examples 204 may include a first outputcharacter and a last output character. For example, the output example“Rogers, B.” includes a first character “R” and a last character “.”.The case of the character may or may not be important. The neuralnetwork 206B may process the output example 204 one character at a timein a “forward pass” and one character at a time in a “backward pass”. Inthe forward pass, the neural network 206B processes the output example204 one character at a time, from the first character to the lastcharacter. In the backward pass, the neural network 206B processes theoutput example 204 one character at a time, from the last character tothe first character. In the example of the output example 204 being“Rogers, B.”, the neural network 206B processes the “R”, “o”. “g”, “e”,“r” . . . “ ”, “B”, and “.” in that order to produce a forward outputfeature vector and in the backward pass, the neural network 206Bprocesses the “.” “B”, “ ”, “,”, “s” . . . “g”, “o”, and “R” in thatorder to produce a backward output feature vector. The output featurevector 216 may include the forward output feature vector, the backwardoutput feature vector, and/or a combination (e.g., concatenation orother combination, such as an addition or average) of the forward outputfeature vector and the backward output feature vector. Concatenationresults, for a maximum string length of T (which corresponds to T timesteps in the LSTM encoding) and a topmost LSTM hidden dimension H, in a2HT-dimensional input representation for each input string and a2HT-dimensional output representation for each output string.

An LSTM network is a type of neural network that contains LSTM units. AnLSTM unit includes no activation function within its recurring units.The LSTM unit generally includes one or more gates that control a flowof data into/out of the unit. The gates may include an input gate,forget gate, and/or output gate. An input gate controls whether a newvalue flows into the unit. A forget gate controls whether a valueremains in the unit. An output gate controls whether a value is used tocompute an output of the unit.

Concatenation is linking things (e.g., numbers, strings, symbols,characters, or the like) together in a chain or series. For example, aconcatenation of the string “William” and the string “Charles” mayinclude “WilliamCharles” or “CharlesWilliam”.

FIG. 3 illustrates, by way of example, a flow diagram of an embodimentof a system 300 for determining a cross-correlation of input-outputexamples. The system 300 as illustrated includes an input example 110,an output example 120, a complete input feature vector 220, a completeoutput feature vector 222, and an encoded vector 310.

The determined cross-correlation between input example and outputexample may be an encoding of the input-output examples. Thecross-correlation may help discover input example substrings that arecopied to the output. The cross-correlation may be computed between eachinput example and output example pair.

The input example 110 is featurized (indicated by arrow 302A) togenerate the complete input feature vector 220. Featurizing may includeoperating on the input example 110 using the neural network 206A,processing circuitry 212A, and/or processing circuitry 212C as describedand illustrated with regard to FIG. 2. The output example 120 isfeaturized (indicated by arrow 302B) to generate the complete outputfeature vector 222. Featurizing may include operating on the outputexample 120 using the neural network 206B, processing circuitry 212B,and/or processing circuitry 212D as described and illustrated withregard to FIG. 2.

The complete input feature vector 220 may include a forward inputfeature vector concatenated with a backward input feature vector. Thecomplete output feature vector 222 may include a forward output featurevector concatenated with a backward output feature vector. In computingthe cross-correlation, a discrete convolution of the complete inputfeature vector 220 and the complete output feature vector 222 may beperformed. A convolution is an operation on an input example and acorresponding output example of an input-output example pair thatproduces an encoding that is a modified version of the input example andthe output example. The convolution provides an element-wise dotproduct, for example, of the input example and the output example as afunction of an amount that the input example and/or output example istranslated.

The convolution includes padding (as indicated by arrow 304A) thecomplete input feature vector 220. Padding the complete input featurevector 220 may include adding a number of null characters (zeros in theexample of FIG. 3) to the complete input feature vector 220 so that thecomplete input feature vector 220 and the complete output feature vector222 include a same number of discrete values in a specific alignment.The padded input feature vector and padded output feature vector arealigned, at operation 306. A symbol-by-symbol operation 308 is performedfor each symbol in the padded input feature vector and a correspondingaligned output feature vector. In one or more embodiments, the operation308 may include a dot product, multiplication, division, addition,subtraction, concatenation, running an LSTM neural network over thevalues, or other operation. For each alignment, the values determined atoperation 308 may be combined (as indicated by symbol a) and used as anelement in the encoded vector 310. The symbol a represents one or moreof a concatenation, sum, average, running an LSTM neural network overthe values, or other operation on the values determined at operation308. For T characters (in our example, T=3), there are (2T−1) suchpossible alignments.

The outputs of the neural networks 206A-B and; or the processingcircuitry 212A-D may be used as an input to the encoder system 300. Foreach input-output example pair (e.g., “Peter T Gates” and “Gates, P.”are an input-output example pair, see FIG. 1), the complete outputfeature vector 222 is “slid” over the complete input feature vector 220(or vice versa). A dot product or other value may be computed betweenthe respective position representation. A result (sum, average,concatenation, a combination thereof, or the like) may be determined foreach overlapping time step. The determined results for each time stepmay then be concatenated to form a 2T−1-dimensional vector encoding foreach example pair. There are 2T−1 possible alignments in total between acomplete input feature vector and a complete output feature vector.

A summed cross correlation encoder includes the symbol a representingaddition. A diffused cross correlation encoder includes the symbolrepresenting a concatenation. In the diffused cross-correlation encoder,the encoded vector 310 includes dimensionality of (2T−1) T (for at mostT non-zero values in each of the (2T−1) alignments). For eachinput-output example pair. An augmented diffused cross correlationencoder may include combining the output of each character position ofthe diffused cross correlation encoder with the character embedding atthis position. Then an LSTM neural network is run over the combinedfeatures to extract a 4*H-dimensional vector for both the input examples110 and the output examples 120. The LSTM neural network output may beconcatenated with the output of the diffused cross correlation encoderforming a (4*H+T*(T−1))-dimensional feature vector for each input-outputexample pair.

An LSTM-sum cross-correlation encoder, instead of computing theelement-wise dot product of the aligned input-output representations,runs a bidirectional (including forward and backward passes) LSTM neuralnetwork over the concatenated feature blocks of each alignment of inputand output representations (e.g., for the first alignment, over thevector [A′,B′,C′,0,0,0,0,D′,E′,F′]). Each alignment may be representedby the 2H-dimensional bi-directional LSTM hidden representation of thefinal time step (from both directions). Such an encoder includes2H·(2T−1) elements in the distributed representation for eachinput-output example.

FIG. 4 illustrates, by way of example, a block diagram of an embodimentof a method 400 for encoding input-output examples, such as the inputexamples 110 and output examples 120. The method 400 as illustratedincludes: processing an input example of input-output examples onecharacter at a time to produce an input feature vector, at operation410; processing an output example of the input-output examplesassociated with the input example one character at a time to produce anoutput feature vector, at operation 420; determining (1) across-correlation between the input feature vector and the outputfeature vector or (2) a previously computed vector for a differentinput-output example that includes a vector sufficiently close to theinput feature vector and the output feature vector, at operation 430;and using the determined cross-correlation or previously computed vectorto generate a compilable or executable program consistent with the inputexample and the output example, at operation 440. The operation 410 mayinclude using a first LSTM neural network. The operation 420 may includeusing a second LSTM neural network.

The input example may include a plurality of characters including afirst input character and a last input character. The output example mayinclude a plurality of characters including a first output character anda last output character. The operation 410 may include traversing, usingthe first LSTM neural network, the input example from the first inputcharacter to the last input character. The operation 420 may includetraversing, using the second LSTM neural network, the output examplefrom the first output character to the last output character.

The input feature vector may include a concatenation or addition of anoutput from the first LSTM neural network over each character of theinput example. The output feature vector may include a concatenation oraddition of an output from the second LSTM neural network over eachcharacter of the output example. The input feature vector may be aforward input feature vector and the output feature vector may be aforward output feature vector. The method 400 may further includeprocessing the input example, using the first LSTM neural network, onecharacter at a time, from the last input character to the first inputcharacter to produce a backward input feature vector. The method 400 mayfurther include processing the output example, using the second LSTMneural network, one character at a time, from the last output characterto the first output character to produce a backward output featurevector.

The operation 430 may include determining a cross-correlation between(1) a concatenated input vector including a concatenation of the forwardinput vector and the backward input vector and (2) a concatenated outputvector including a concatenation of the forward output vector and thebackward output vector. The operation 430 may include convoluting theconcatenated input vector and the concatenated output vector to producea vector of elements. The method 400 may further include performing anoperation including one or more of a sum, average, and concatenation ofvalues of each element of the elements of the vector.

The method 400 may further include forming the first and second LSTMneural networks by training, using programs limited to a domain specificlanguage (DSL) and a plurality of I/O examples consistent with each ofthe programs, the first and second LSTM neural networks. The DSL maycomprise string, integer, real number, or other symbol transformations.

FIG. 5A illustrates, by way of example, a block diagram of an embodimentof a workflow for training neural networks for program synthesis. Togenerate a large training set (e.g., including many programs with ten totwenty input-output examples each), a program sampler 500 may uniformlysample programs from the DSL 502 and generate well-formed inputs (e.g.,input strings) that satisfy the pre-conditions of the programs based onsuitable input generation rules 504. The corresponding outputs (e.g.,output strings) are obtained by running the programs on the inputs. Theresulting input-output examples 506 are passed through the input-outputencoder 508, which generates distributed representations therefrom.These distributed representations of the input-output examples, alongwith the symbols and production rules of the DSL 502, are passed to theprogram-generation model 510 (e.g., in accordance with variousembodiments, an R3NN model as described in detail below), which encodesand iteratively expands partial program trees 512 to incrementally buildcomplete program trees. A measure of the difference between a programgenerated by the program-generation model 510 based on the input-outputexamples and the corresponding program sampled by the program sampler500 is used as feedback to adjust parameters of the neural networks usedin the input-output encoder 508 and the program-generation model 510.Suitable difference measures and techniques for training neural networksare well-known to those of ordinary skill in the art. Network weightsfor the program-generation model 510 and the input-output encoder 508may be trained, for example, using the Adaptive Moment Estimation (Adam)variant of the stochastic gradient descent algorithm.

FIG. 5B illustrates, by way of example, a block diagram of an embodimentof a workflow for using trained neural networks to synthesize a programin a given DSL based on input-output examples. The workflow uses theinput-output encoder 508 and program-generation model 510, once trainedfor a particular DSL 502, to synthesize programs in that DSL based oninput-output examples. A set of input-output examples 520 specifying thetask to be performed by the program is fed into the input-output encoder508 to generate a distributed representation therefrom. The encodedinput-output examples are provided as input to the program-generationmodel 510, which, in the same manner as in the training phase describedwith reference to FIG. 5A, but now with fixed values of theneural-network parameters, iteratively expands a partial program tree512, beginning with the root node, until all leaf nodes of the tree areterminals.

Recursive-Reverse-Recursive Neural Network (R3NN)

In various embodiments, the program-generation model 510 uses an R3NN toprovide an efficient way of assigning probabilities to every validexpansion in the current partial program. Herein, a valid expansion isspecified by two components: the production rule used, and the positionof the non-terminal leaf node to which the production rule is appliedrelative to every other node in the tree. To account for the firstcomponent, a separate distributed representation for each productionrule is maintained. The second component is handled using anarchitecture in which each node of the partial tree encodes globalinformation about every other node in the tree. In brief, the R3NNassigns an initial distributed representation to each leaf node, andthen performs a recursive pass through the tree from the leaves to theroot node, followed by a reverse-recursive pass from the root back tothe leaf nodes, resulting in a “global leaf representation” for eachleaf node. The probability of a given expansion is calculated from theglobal leaf representation of the respective non-terminal leaf node andthe distributed representation of the respective production rule, e.g.,as a quantity proportional to the inner product between the productionrule representation and the global leaf representation of thenon-terminal node.

In more detail, the R3NN includes the following parameters for thegrammar described by a DSL (which can be any functional DSL. i.e., anyDSL without control flow (via loops and conditionals, etc.) and withoutstateful variables):

1. For every symbol s∈S, an M-dimensional representation θ(s)∈

^(M).

2. For every production rule r∈R, an M-dimensional representation ω(r)∈

^(M).

3. For every production rule r∈R, a deep neural network f_(r) whichtakes as input a vector x∈

^(Q·M), with Q being the number of symbols on the right hand side of theproduction rule r, and outputs a vector y∈

^(M). The input to the production-rule network f_(r) is a concatenationof the distributed representations of each of its right-hand-side (RHS)symbols, and the output is a distributed representation for theleft-hand-side (LHS) symbol.

4. For every production rule r∈R, an additional deep neural networkg_(r) which takes as input a vector x∈

^(M) and outputs a vector y∈

^(Q·M). The deep neural network g_(r) can be thought of as a reverseproduction-rule network that takes as input a distributed representationof the LHS symbols and produces a concatenation of the distributedrepresentations of RHS symbols of the production rule.

FIGS. 6A and 6B illustrate, by way of example, a block diagram of anembodiment of a recursive pass and a reverse-recursive pass through anexample partial program tree, as may be used in a determination ofexpansion probabilities. Let E be the set of all valid expansions in apartial program tree T, let L be the current leaf nodes of T and N thecurrent non-leaf (i.e., rule) nodes of T; and let s(l) be the symbol ofleaf l∈L and r(n) represent the production rule of non-leaf node n∈N. Tocompute the probability distribution over the set E, the R3NN firstcomputes a distributed global leaf representation for each leaf node(that is, a representation for each leaf node that contains global treeinformation).

With reference to FIG. 6A, the R3NN initially assigns to each leaf nodel∈L in the tree its distributed representation θ(s(l)). A recursivebottom-to-top (RHS-to-LHS) pass through the tree is then performed bygoing up the tree and applying f_(r) (n) to every non-leaf node n∈N onits right-hand node representations. At each step, the networks f_(r)(n) produce a node representation that is input into the parent's rulenetwork. The process is repeated iteratively until the root node isreached. At that point, the root node has an associatedfixed-dimensionality global tree representation φ(root). Thisrepresentation, however, has lost any notion of tree position. To solvethis problem, the R3NN now performs what is effectively areverse-recursive pass that starts at the root node with φ(root) asinput and moves towards the leaf nodes.

With reference to FIG. 6B, more concretely, the root node representationφ(root) resulting from the recursive pass is supplied as input into therule network g_(r)(root) for the production rule r(root) that is appliedto the start symbol in T. This produces a representation φ_(c) for eachRHS node c of r(root). If c is a non-leaf node, the procedure isrepeated for node c. i.e., φ_(c) is input into g_(r)(c) to getrepresentations φ_(cc) for every RHS node cc of r(c), and so on. If c isa leaf node, the leaf representation φ_(c) now has an information pathto φ(root) and thus to every other leaf node in the tree. Accordingly,once the reverse-recursive pass is complete, the resulting distributedrepresentation φ_(l) for every leaf node l contains global treeinformation (and this therefore a global leaf representation). While theinitial leaf representations φ(l₁) and φ(l₂) are equal for leaf nodesthat have the same symbol type, the global leaf representations φ_(l₁)and φ_(l₂) are generally not equal even if they have the same symboltype because the respective leaves are at different positions in thetree.

Once the global leaf representations φ_(l) have been computed, it isstraightforward to determine scores for all possible expansions e∈E. Forany given expansion e, let e.r be the expansion type (i.e., theproduction rule r∈R that e applies) and let e.l be the leaf node l thate.r is applied to. The score of an expansion may then be calculated as afunction of the global leaf representation φ_(e.l) and the distributedrepresentation ω(e.r). For example, in some embodiments, the score iscalculated as the product Z_(e)=φ_(e.l)·ω(e.r). The probabilitydistribution over the set of extensions may be a normalized exponentialdistribution over the scores, that is, the probably of a given expansione may be the exponentiated score, normalized by the sum of exponentiatedscores over all extensions:

${\pi (e)} = {\frac{e^{z_{e}}}{\Sigma_{{e\; \prime} \in E}e^{z}e\; \prime}.}$

In some embodiments, to reduce the minimum length that information hasto propagate between nodes in the tree, the global leaf representationsare processed with a bidirectional LSTM network (as is known in to thoseof ordinary skill in the art) right before calculating the scores, andthe LSTM hidden states, rather than the leaves themselves, are used inthe score calculation. The global leaf representations are orderedsequentially from left-most leaf node to right-mode leaf node, whereeach leaf node is treated as a time step for a bidirectional-LSTM toprocess. This processing provides a skip connection between leaf nodes,which potentially reduces the path length that information needs totravel between leaf nodes in the tree.

While the above-described example embodiments refer specifically to theencoding of input and output strings in the DSL of stringtransformations, LSTM neural networks and cross-correlation encodersemploying the principles described above may also be used to encodeother types of input-output examples for other DSLs. Further, variousmodifications of and alternatives to the input-output encodingembodiments described herein may occur to those of ordinary skill in theart. For instance, input-output encoders as described herein can beaugmented with additional external memory and/or attention vectors tolearn richer distributed representations.

FIG. 7 illustrates, by way of example, a block diagram of an embodimentof a machine 1000 (e.g., a computer system) to implement input-outputexample encoding and/or program synthesis. One example machine 1000 (inthe form of a computer), may include a processing unit 1002, memory1003, removable storage 1010, and non-removable storage 1012. Althoughthe example computing device is illustrated and described as machine1000, the computing device may be in different forms in differentembodiments. For example, the computing device may instead be asmartphone, a tablet, smartwatch, or other computing device includingthe same or similar elements as illustrated and described with regard toFIG. 7. Devices such as smartphones, tablets, and smartwatches aregenerally collectively referred to as mobile devices. Further, althoughthe various data storage elements are illustrated as part of the machine1000, the storage may also or alternatively include cloud-based storageaccessible via a network, such as the Internet.

Memory 1003 may include volatile memory 1014 and non-volatile memory1008. The machine 1000 may include—or have access to a computingenvironment that includes—a variety of computer-readable media, such asvolatile memory 1014 and non-volatile memory 1008, removable storage1010 and non-removable storage 1012. Computer storage includes randomaccess memory (RAM), read only memory (ROM), erasable programmableread-only memory (EPROM) & electrically erasable programmable read-onlymemory (EEPROM), flash memory or other memory technologies, compact discread-only memory (CD ROM), Digital Versatile Disks (DVD) or otheroptical disk storage, magnetic cassettes, magnetic tape, magnetic diskstorage or other magnetic storage devices capable of storingcomputer-readable instructions for execution to perform functionsdescribed herein.

The machine 1000 may include or have access to a computing environmentthat includes input 1006, output 1004, and a communication connection1016. Output 1004 may include a display device, such as a touchscreen,that also may serve as an input device. The input 1006 may include oneor more of a touchscreen, touchpad, mouse, keyboard, camera, one or moredevice-specific buttons, one or more sensors integrated within orcoupled via wired or wireless data connections to the machine 1000, andother input devices. The computer may operate in a networked environmentusing a communication connection to connect to one or more remotecomputers, such as database servers, including cloud based servers andstorage. The remote computer may include a personal computer (PC),server, router, network PC, a peer device or other common network node,or the like. The communication connection may include a Local AreaNetwork (LAN), a Wide Area Network (WAN), cellular, Institute ofElectrical and Electronics Engineers (IEEE) 802.11 (Wi-Fi), Bluetooth,or other networks.

Computer-readable instructions stored on a computer-readable storagedevice are executable by the processing unit 1002 of the machine 1000. Ahard drive, CD-ROM, and RAM are some examples of articles including anon-transitory computer-readable medium such as a storage device. Forexample, a computer program 1018 may be used to cause processing unit1002 to perform one or more methods or algorithms described herein.

Additional Notes and Examples

Example 1 includes a device comprising a processor, and a memory devicecoupled to the processor, the memory device including a program storedthereon for execution by the processor to perform operations, theoperations comprising processing an input example of input-outputexamples, using a first long short term memory (LSTM) neural network,one character at a time to produce an input feature vector, processingan output example associated with the input example in the input-outputexamples, using a second LSTM neural network, one character at a time toproduce an output feature vector, determining (a) a cross-correlationbetween the input feature vector and the output feature vector or (b) apreviously computed vector for a different input-output example thatincludes feature vectors less than a threshold distance from the inputfeature vector and the output feature vector, respectively, and usingthe determined cross-correlation or previously computed vector,generating a program consistent with the input example and outputexample.

In Example 2, Example 1 may further include, wherein the input exampleincludes a plurality of characters including a first input character anda last input character, the output example includes a plurality ofcharacters including a first output character and a last outputcharacter, the processing of the input example includes traversing,using the first LSTM neural network, the input example from the firstinput character to the last input character, and the processing of theoutput example includes traversing, using the second LSTM neuralnetwork, the output example from the first output character to the lastoutput character.

In Example 3, Example 2 may further include, wherein the input featurevector includes a concatenation or addition of an output from the firstLSTM neural network over each character of the input example and whereinthe output feature vector includes a concatenation or addition of anoutput from the second LSTM neural network over each character of theoutput example.

In Example 4, Example 3 may further include, wherein determining (a) across-correlation between the input feature vector and the outputfeature vector or (b) a previously computed vector for a differentinput-output example that includes a vector sufficiently close to theinput feature vector and the output feature vector includes determiningthe cross-correlation between the input feature vector and the outputfeature vector.

In Example 5, Example 4 may further include, wherein the input featurevector is a forward input feature vector and the output feature vectoris a forward output feature vector, the operations further comprisingprocessing the input example, using the first LSTM neural network, onecharacter at a time, from the last input character to the first inputcharacter to produce a backward input feature vector, processing theoutput example, using the second LSTM neural network, one character at atime, from the last output character to the first output character toproduce a backward output feature vector, wherein determining across-correlation between the input feature vector and the outputfeature vector includes determining a cross-correlation between (a) aconcatenated input vector including a concatenation of the forward inputfeature vector and the backward input feature vector and (b) aconcatenated output vector including a concatenation of the forwardoutput feature vector and the backward output feature vector, whereindetermining the cross-correlation includes convoluting the concatenatedinput vector and the concatenated output vector to produce a vector ofelements, and performing an operation including one or more of a sum,average, and concatenation on values of each element of the elements ofthe vector.

Example 6 includes a method of generating a program using an encoding ofinput-output examples, the method comprising processing an input exampleof the input-output examples, using a first long short term memory(LSTM) neural network, one character at a time to produce an inputfeature vector, processing an output example associated with the inputexample in the input-output examples, using a second LSTM neuralnetwork, one character at a time to produce an output feature vector,determining (a) a cross-correlation between the input feature vector andthe output feature vector or (b) a previously computed vector for adifferent input-output example that includes feature vectors less than athreshold distance from the input feature vector and the output featurevector, respectively, and using the determined cross-correlation orpreviously computed vector, generating a program consistent with theinput example and output example.

In Example 7, Example 6 may further include, wherein the input exampleincludes a plurality of characters including a first input character anda last input character, the output example includes a plurality ofcharacters including a first output character and a last outputcharacter, the processing of the input example includes traversing,using the first LSTM neural network, the input example from the firstinput character to the last input character, and the processing of theoutput example includes traversing, using the second LSTM neuralnetwork, the output example from the first output character to the lastoutput character.

In Example 8, Example 7 may further include, wherein the input featurevector includes a concatenation or addition of an output from the firstLSTM neural network over each character of the input example and whereinthe output feature vector includes a concatenation or addition of anoutput from the second LSTM neural network over each character of theoutput example.

In Example 9, Example 8 may further include, wherein determining (a) across-correlation between the input feature vector and the outputfeature vector or (b) a previously computed vector for a differentinput-output example that includes a vector sufficiently close to theinput feature vector and the output feature vector includes determiningthe cross-correlation between the input feature vector and the outputfeature vector.

In Example 10, Example 9 may further include, wherein the input featurevector is a forward input feature vector and the output feature vectoris a forward output feature vector, the method further comprisingprocessing the input example, using the first LSTM neural network, onecharacter at a time, from the last input character to the first inputcharacter to produce a backward input feature vector, processing theoutput example, using the second LSTM neural network, one character at atime, from the last output character to the first output character toproduce a backward output feature vector, and wherein determining across-correlation between the input feature vector and the outputfeature vector includes determining a cross-correlation between (a) aconcatenated input vector including a concatenation of the forward inputfeature vector and the backward input feature vector and (b) aconcatenated output vector including a concatenation of the forwardoutput feature vector and the backward output feature vector.

In Example 11, Example 10 may further include, wherein determining thecross-correlation includes convoluting the concatenated input vector andthe concatenated output vector to produce a vector of elements.

In Example 12, Example 11 may further include performing an operationincluding one or more of a sum, average, and concatenation of values ofeach element of the elements of the vector.

In Example 13, Example 12 may further include forming the first andsecond LSTM neural networks by training, using programs limited to adomain specific language (DSL) and a plurality of input-output examplesconsistent with each of the programs, the first and second LSTM neuralnetworks.

In Example 14, Example 13 may further include, wherein generating theprogram consistent with the input example and output example includesusing a recursive-reverse-recursive neural network (R3NN).

In Example 15, at least one of Examples 6-14 may further include,wherein determining (a) a cross-correlation between the input featurevector and the output feature vector or (b) a previously computed vectorfor a different input-output example that includes a vector sufficientlyclose to the input feature vector and the output feature vector includesdetermining the previously computed vector for a different input-outputexample that includes a vector sufficiently close to the input featurevector and the output feature vector.

Example 16 includes a non-transitory machine-readable medium includinginstructions for execution by a processor of the machine to performoperations comprising processing an input example of input-outputexamples, using a first long short term memory (LSTM) neural network,one character at a time to produce an input feature vector, processingan output example associated with the input example in the input-outputexamples, using a second LSTM neural network, one character at a time toproduce an output feature vector, determining (a) a cross-correlationbetween the input feature vector and the output feature vector or (b) apreviously computed vector for a different input-output example thatincludes feature vectors less than a threshold distance from the inputfeature vector and the output feature vector, respectively, and usingthe determined cross-correlation or previously computed vector,generating a program consistent with the input example and outputexample.

In Example 17, Example 16 may further include, wherein the input exampleincludes a plurality of characters including a first input character anda last input character, the output example includes a plurality ofcharacters including a first output character and a last outputcharacter, the processing of the input example includes traversing,using the first LSTM neural network, the input example from the firstinput character to the last input character, and the processing of theoutput example includes traversing, using the second LSTM neuralnetwork, the output example from the first output character to the lastoutput character.

In Example 18, Example 17 may further include, wherein the input featurevector includes a concatenation or addition of an output from the firstLSTM neural network over each character of the input example and whereinthe output feature vector includes a concatenation or addition of anoutput from the second LSTM neural network over each character of theoutput example.

In Example 19, Example 18 may further include, wherein determining (a) across-correlation between the input feature vector and the outputfeature vector or (b) a previously computed vector for a differentinput-output example that includes a vector sufficiently close to theinput feature vector and the output feature vector includes determiningthe cross-correlation between the input feature vector and the outputfeature vector.

In Example 20, Example 19 may further include, wherein the input featurevector is a forward input feature vector and the output feature vectoris a forward output feature vector, the operations further comprisingprocessing the input example, using the first LSTM neural network, onecharacter at a time, from the last input character to the first inputcharacter to produce a backward input feature vector, processing theoutput example, using the second LSTM neural network, one character at atime, from the last output character to the first output character toproduce a backward output feature vector, and wherein determining across-correlation between the input feature vector and the outputfeature vector includes determining a cross-correlation between (a) aconcatenated input vector including a concatenation of the forward inputfeature vector and the backward input feature vector and (b) aconcatenated output vector including a concatenation of the forwardoutput feature vector and the backward output feature vector.

Although a few embodiments have been described in detail above, othermodifications are possible. For example, the logic flows depicted in thefigures do not require the particular order shown, or sequential order,to achieve desirable results. Other steps may be provided, or steps maybe eliminated, from the described flows, and other components may beadded to, or removed from, the described systems. Other embodiments maybe within the scope of the following claims.

What is claimed is:
 1. A device comprising: a processor; and a memorydevice coupled to the processor, the memory device including a programstored thereon for execution by the processor to perform operations, theoperations comprising: processing an input example of input-outputexamples, using a first long short term memory (LSTM) neural network,one character at a time to produce an input feature vector; processingan output example associated with the input example in the input-outputexamples, using a second LSTM neural network, one character at a time toproduce an output feature vector; determining (a) a cross-correlationbetween the input feature vector and the output feature vector or (b) apreviously computed vector for a different input-output example thatincludes feature vectors less than a threshold distance from the inputfeature vector and the output feature vector, respectively; and usingthe determined cross-correlation or previously computed vector,generating a program consistent with the input example and outputexample.
 2. The device of claim 1, wherein: the input example includes aplurality of characters including a first input character and a lastinput character; the output example includes a plurality of charactersincluding a first output character and a last output character; theprocessing of the input example includes traversing, using the firstLSTM neural network, the input example from the first input character tothe last input character; and the processing of the output exampleincludes traversing, using the second LSTM neural network, the outputexample from the first output character to the last output character. 3.The device of claim 2, wherein the input feature vector includes aconcatenation or addition of an output from the first LSTM neuralnetwork over each character of the input example and wherein the outputfeature vector includes a concatenation or addition of an output fromthe second LSTM neural network over each character of the outputexample.
 4. The device of claim 3, wherein determining (a) across-correlation between the input feature vector and the outputfeature vector or (b) a previously computed vector for a differentinput-output example that includes a vector sufficiently close to theinput feature vector and the output feature vector includes determiningthe cross-correlation between the input feature vector and the outputfeature vector.
 5. The device of claim 4, wherein the input featurevector is a forward input feature vector and the output feature vectoris a forward output feature vector, the operations further comprising:processing the input example, using the first LSTM neural network, onecharacter at a time, from the last input character to the first inputcharacter to produce a backward input feature vector; processing theoutput example, using the second LSTM neural network, one character at atime, from the last output character to the first output character toproduce a backward output feature vector; wherein determining across-correlation between the input feature vector and the outputfeature vector includes determining a cross-correlation between (a) aconcatenated input vector including a concatenation of the forward inputfeature vector and the backward input feature vector and (b) aconcatenated output vector including a concatenation of the forwardoutput feature vector and the backward output feature vector; whereindetermining the cross-correlation includes convoluting the concatenatedinput vector and the concatenated output vector to produce a vector ofelements; and performing an operation including one or more of a sum,average, and concatenation on values of each element of the elements ofthe vector.
 6. A method of generating a program using an encoding ofinput-output examples, the method comprising: processing an inputexample of the input-output examples, using a first long short termmemory (LSTM) neural network, one character at a time to produce aninput feature vector; processing an output example associated with theinput example in the input-output examples, using a second LSTM neuralnetwork, one character at a time to produce an output feature vector;determining (a) a cross-correlation between the input feature vector andthe output feature vector or (b) a previously computed vector for adifferent input-output example that includes feature vectors less than athreshold distance from the input feature vector and the output featurevector, respectively; and using the determined cross-correlation orpreviously computed vector, generating a program consistent with theinput example and output example.
 7. The method of claim 6, wherein: theinput example includes a plurality of characters including a first inputcharacter and a last input character; the output example includes aplurality of characters including a first output character and a lastoutput character; the processing of the input example includestraversing, using the first LSTM neural network, the input example fromthe first input character to the last input character; and theprocessing of the output example includes traversing, using the secondLSTM neural network, the output example from the first output characterto the last output character.
 8. The method of claim 7, wherein theinput feature vector includes a concatenation or addition of an outputfrom the first LSTM neural network over each character of the inputexample and wherein the output feature vector includes a concatenationor addition of an output from the second LSTM neural network over eachcharacter of the output example.
 9. The method of claim 8, whereindetermining (a) a cross-correlation between the input feature vector andthe output feature vector or (b) a previously computed vector for adifferent input-output example that includes a vector sufficiently closeto the input feature vector and the output feature vector includesdetermining the cross-correlation between the input feature vector andthe output feature vector.
 10. The method of claim 9, wherein the inputfeature vector is a forward input feature vector and the output featurevector is a forward output feature vector, the method furthercomprising: processing the input example, using the first LSTM neuralnetwork, one character at a time, from the last input character to thefirst input character to produce a backward input feature vector;processing the output example, using the second LSTM neural network, onecharacter at a time, from the last output character to the first outputcharacter to produce a backward output feature vector; and whereindetermining a cross-correlation between the input feature vector and theoutput feature vector includes determining a cross-correlation between(a) a concatenated input vector including a concatenation of the forwardinput feature vector and the backward input feature vector and (b) aconcatenated output vector including a concatenation of the forwardoutput feature vector and the backward output feature vector.
 11. Themethod of claim 10, wherein determining the cross-correlation includesconvoluting the concatenated input vector and the concatenated outputvector to produce a vector of elements.
 12. The method of claim 11,further comprising performing an operation including one or more of asum, average, and concatenation of values of each element of theelements of the vector.
 13. The method of claim 6 further comprising:forming the first and second LSTM neural networks by training, usingprograms limited to a domain specific language (DSL) and a plurality ofinput-output examples consistent with each of the programs, the firstand second LSTM neural networks.
 14. The method of claim 13, whereingenerating the program consistent with the input example and outputexample includes using a recursive-reverse-recursive neural network(R3NN).
 15. The method of claim 6, wherein determining (a) across-correlation between the input feature vector and the outputfeature vector or (b) a previously computed vector for a differentinput-output example that includes a vector sufficiently close to theinput feature vector and the output feature vector includes determiningthe previously computed vector for a different input-output example thatincludes a vector sufficiently close to the input feature vector and theoutput feature vector.
 16. A non-transitory machine-readable mediumincluding instructions for execution by a processor of the machine toperform operations comprising: processing an input example ofinput-output examples, using a first long short term memory (LSTM)neural network, one character at a time to produce an input featurevector; processing an output example associated with the input examplein the input-output examples, using a second LSTM neural network, onecharacter at a time to produce an output feature vector; determining (a)a cross-correlation between the input feature vector and the outputfeature vector or (b) a previously computed vector for a differentinput-output example that includes feature vectors less than a thresholddistance from the input feature vector and the output feature vector,respectively; and using the determined cross-correlation or previouslycomputed vector, generating a program consistent with the input exampleand output example.
 17. The non-transitory machine-readable medium ofclaim 16, wherein: the input example includes a plurality of charactersincluding a first input character and a last input character; the outputexample includes a plurality of characters including a first outputcharacter and a last output character; the processing of the inputexample includes traversing, using the first LSTM neural network, theinput example from the first input character to the last inputcharacter; and the processing of the output example includes traversing,using the second LSTM neural network, the output example from the firstoutput character to the last output character.
 18. The non-transitorymachine-readable medium of claim 17, wherein the input feature vectorincludes a concatenation or addition of an output from the first LSTMneural network over each character of the input example and wherein theoutput feature vector includes a concatenation or addition of an outputfrom the second LSTM neural network over each character of the outputexample.
 19. The non-transitory machine-readable medium of claim 18,wherein determining (a) a cross-correlation between the input featurevector and the output feature vector or (b) a previously computed vectorfor a different input-output example that includes a vector sufficientlyclose to the input feature vector and the output feature vector includesdetermining the cross-correlation between the input feature vector andthe output feature vector.
 20. The non-transitory machine-readablemedium of claim 19, wherein the input feature vector is a forward inputfeature vector and the output feature vector is a forward output featurevector, the operations further comprising: processing the input example,using the first LSTM neural network, one character at a time, from thelast input character to the first input character to produce a backwardinput feature vector; processing the output example, using the secondLSTM neural network, one character at a time, from the last outputcharacter to the first output character to produce a backward outputfeature vector; and wherein determining a cross-correlation between theinput feature vector and the output feature vector includes determininga cross-correlation between (a) a concatenated input vector including aconcatenation of the forward input feature vector and the backward inputfeature vector and (b) a concatenated output vector including aconcatenation of the forward output feature vector and the backwardoutput feature vector.